Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
Diagnostics (Basel) ; 12(7)2022 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-35885630

RESUMEN

INTRODUCTION: This study investigates whether it is possible to predict a final diagnosis based on a written nephropathological description-as a surrogate for image analysis-using various NLP methods. METHODS: For this work, 1107 unlabelled nephropathological reports were included. (i) First, after separating each report into its microscopic description and diagnosis section, the diagnosis sections were clustered unsupervised to less than 20 diagnostic groups using different clustering techniques. (ii) Second, different text classification methods were used to predict the diagnostic group based on the microscopic description section. RESULTS: The best clustering results (i) could be achieved with HDBSCAN, using BoW-based feature extraction methods. Based on keywords, these clusters can be mapped to certain diagnostic groups. A transformer encoder-based approach as well as an SVM worked best regarding diagnosis prediction based on the histomorphological description (ii). Certain diagnosis groups reached F1-scores of up to 0.892 while others achieved weak classification metrics. CONCLUSION: While textual morphological description alone enables retrieving the correct diagnosis for some entities, it does not work sufficiently for other entities. This is in accordance with a previous image analysis study on glomerular change patterns, where some diagnoses are associated with one pattern, but for others, there exists a complex pattern combination.

2.
Stud Health Technol Inform ; 278: 224-230, 2021 May 24.
Artículo en Inglés | MEDLINE | ID: mdl-34042898

RESUMEN

INTRODUCTION: The aim of this study is to evaluate the use of a natural language processing (NLP) software to extract medication statements from unstructured medical discharge letters. METHODS: Ten randomly selected discharge letters were extracted from the data warehouse of the University Hospital Erlangen (UHE) and manually annotated to create a gold standard. The AHD NLP tool, provided by MIRACUM's industry partner was used to annotate these discharge letters. Annotations by the NLP tool where then compared to the gold standard on two levels: phrase precision (whether or not the whole medication statement has been identified correctly) and token precision (whether or not the medication name has been identified correctly within correctly discovered medication phrases). RESULTS: The NLP tool detected medication related phrases with an overall F-measure of 0.852. The medication name has been identified correctly with an overall F-measure of 0.936. DISCUSSION: This proof-of-concept study is a first step towards an automated scalable evaluation system for MIRACUM's industry partner's NLP tool by using a gold standard. Medication phrases and names have been correctly identified in most cases by the NLP system. Future effort needs to be put into extending and validating the gold standard.


Asunto(s)
Procesamiento de Lenguaje Natural , Alta del Paciente , Humanos , Programas Informáticos
3.
Ophthalmologe ; 118(3): 264-272, 2021 Mar.
Artículo en Alemán | MEDLINE | ID: mdl-32725541

RESUMEN

BACKGROUND: Anti-VEGF drugs are currently used to treat macular diseases. This has led to a wealth of additional data, which could help understand and predict treatment courses; however, this information is usually only available in free text form. OBJECTIVE: A retrospective study was designed to analyze how far interpretable information can be obtained from clinical texts by automated extraction. The aim was to assess the suitability of a text mining method that was customized for this purpose. MATERIAL AND METHODS: Data on 3683 patients were available, including 40,485 discharge letters. Some of the data of interest, e.g. visual acuity (VA), intraocular pressure (IOP) and accompanying diagnoses, were not only recorded textually but also entered in a database and could thus serve as a gold standard for text analysis. The text was analyzed using the Averbis Health Discovery text mining platform. To optimize the extraction task, rule knowledge and a German language technical vocabulary linked to the international medical terminology standard systematized nomenclature of medicine (SNOMED CT) was manually added. RESULTS: The correspondence between extracted data and the structured database entries is described by the F1 value. There was agreement of 94.7% for VA, 98.3% for IOP and 94.7% for the accompanying diagnoses. Manual analysis of noncorresponding cases showed that in 50% text content did not match the database content for various reasons. After an adjustment, F1 values 1-3% above the previously determined values were obtained. CONCLUSION: Text mining procedures are very well suited for the considered discharge letter corpus and the problem described in order to extract contents from clinical texts in a structured manner for further evaluation.


Asunto(s)
Minería de Datos , Systematized Nomenclature of Medicine , Bases de Datos Factuales , Registros Electrónicos de Salud , Humanos , Presión Intraocular , Estudios Retrospectivos
5.
Gesundheitswesen ; 82(S 02): S158-S164, 2020 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-31597185

RESUMEN

HINTERGRUND: In Sekundärdaten existieren oftmals unstrukturierte Freitexte. In dieser Arbeit wird ein Text-Mining-System validiert, um unstrukturierte medizinische Daten für Forschungszwecke zu extrahieren. METHODEN: Aus einer radiologischen Klinik wurden aus 7102 CT-Befunden 1000 zufällig ausgewählt. Diese wurden von 2 Medizinern manuell in definierte Befundgruppen eingeteilt. Zur automatisierten Verschlagwortung und Klassifizierung wurde die Textanalyse-Software Averbis Extraction Platform (AEP) eingesetzt. Besonderheiten des Systems sind u. a. eine morphologische Analyse zur Zerlegung zusammengesetzter Wörter sowie die Erkennung von Nominalphrasen, Abkürzungen und negierten Aussagen. Anhand der extrahierten standardisierten Schlüsselwörter werden Befundberichte mithilfe maschineller Lernverfahren den vorgegebenen Befundgruppen zugeordnet. Zur Bewertung von Reliabilität und Validität des automatisierten Verfahrens werden die automatisierten und 2 unabhängige manuelle Klassifizierungen in mehreren Durchläufen auf Übereinstimmungen hin verglichen. ERGEBNISSE: Die manuelle Klassifizierung war zu zeitaufwendig. Bei der automatisierten Verschlagwortung stellte sich in unseren Daten die Klassifizierung nach ICD-10 als ungeeignet heraus. Ebenfalls zeigte sich, dass die Stichwortsuche keine verlässlichen Ergebnisse liefert. Computerunterstütztes Textmining in Kombination mit maschinellem Lernen führte zu verlässlichen Klassifizierungen. Die Inter-Rater-Reliabilität der beiden manuellen Klassifizierungen, sowie der maschinellen und der manuellen Klassifizierung war sehr hoch. Beide manuelle Klassifizierungen stimmten in 93% aller Befunde überein. Der Kappa-Koeffizient beträgt 0,89 [95% Konfidenzintervall (KI) 0,87-0,92]. Die automatische Klassifizierung stimmte in 86% aller Befunde mit der unabhängigen, zweiten manuellen Klassifizierung überein (Kappa-Koeffizient 0,79 [95% KI 0,75-0,81]). DISKUSSION: Die Klassifizierung der Software AEP war sehr gut. In unserer Studie folgte sie allerdings einem systematischen Muster. Die meisten falschen Zuordnungen finden sich in Befunden, die auf ein erhöhtes Krebsrisiko hinweisen. Die Freitextstruktur der Befunde lässt Bedenken hinsichtlich der Machbarkeit einer rein automatisierten Analyse aufkommen. Die Kombination aus menschlichem Intellekt und einer intelligenten, lernfähigen Software erscheint als zukunftsweisend, um unstrukturierte aber wichtige Textinformationen der Forschung zugänglich machen zu können.


Asunto(s)
Registros Médicos , Semántica , Minería de Datos , Alemania
6.
Stud Health Technol Inform ; 264: 83-87, 2019 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-31437890

RESUMEN

Semantic standards and human language technologies are key enablers for semantic interoperability across heterogeneous document and data collections in clinical information systems. Data provenance is awarded increasing attention, and it is especially critical where clinical data are automatically extracted from original documents, e.g. by text mining. This paper demonstrates how the output of a commercial clinical text-mining tool can be harmonised with FHIR, the leading clinical information model standard. Character ranges that indicate the origin of an annotation and machine generates confidence values were identified as crucial elements of data provenance in order to enrich text-mining results. We have specified and requested necessary extensions to the FHIR standard and demonstrated how, as a result, important metadata describing processes generating FHIR instances from clinical narratives can be embedded.


Asunto(s)
Minería de Datos , Registros Electrónicos de Salud , Atención a la Salud , Humanos , Metadatos , Semántica
7.
Methods Inf Med ; 57(S 01): e82-e91, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-30016814

RESUMEN

INTRODUCTION: This article is part of the Focus Theme of Methods of Information in Medicine on the German Medical Informatics Initiative. Similar to other large international data sharing networks (e.g. OHDSI, PCORnet, eMerge, RD-Connect) MIRACUM is a consortium of academic and hospital partners as well as one industrial partner in eight German cities which have joined forces to create interoperable data integration centres (DIC) and make data within those DIC available for innovative new IT solutions in patient care and medical research. OBJECTIVES: Sharing data shall be supported by common interoperable tools and services, in order to leverage the power of such data for biomedical discovery and moving towards a learning health system. This paper aims at illustrating the major building blocks and concepts which MIRACUM will apply to achieve this goal. GOVERNANCE AND POLICIES: Besides establishing an efficient governance structure within the MIRACUM consortium (based on the steering board, a central administrative office, the general MIRACUM assembly, six working groups and the international scientific advisory board), defining DIC governance rules and data sharing policies, as well as establishing (at each MIRACUM DIC site, but also for MIRACUM in total) use and access committees are major building blocks for the success of such an endeavor. ARCHITECTURAL FRAMEWORK AND METHODOLOGY: The MIRACUM DIC architecture builds on a comprehensive ecosystem of reusable open source tools (MIRACOLIX), which are linkable and interoperable amongst each other, but also with the existing software environment of the MIRACUM hospitals. Efficient data protection measures, considering patient consent, data harmonization and a MIRACUM metadata repository as well as a common data model are major pillars of this framework. The methodological approach for shared data usage relies on a federated querying and analysis concept. USE CASES: MIRACUM aims at proving the value of their DIC with three use cases: IT support for patient recruitment into clinical trials, the development and routine care implementation of a clinico-molecular predictive knowledge tool, and molecular-guided therapy recommendations in molecular tumor boards. RESULTS: Based on the MIRACUM DIC release in the nine months conceptual phase first large scale analysis for stroke and colorectal cancer cohorts have been pursued. DISCUSSION: Beyond all technological challenges successfully applying the MIRACUM tools for the enrichment of our knowledge about diagnostic and therapeutic concepts, thus supporting the concept of a Learning Health System will be crucial for the acceptance and sustainability in the medical community and the MIRACUM university hospitals.


Asunto(s)
Investigación Biomédica , Atención a la Salud , Hospitales Universitarios , Informática Médica , Gestión Clínica , Conocimientos, Actitudes y Práctica en Salud , Humanos , Difusión de la Información , Selección de Paciente , Políticas , Motor de Búsqueda
8.
Methods Inf Med ; 57(S 01): e92-e105, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-30016815

RESUMEN

INTRODUCTION: This article is part of the Focus Theme of Methods of Information in Medicine on the German Medical Informatics Initiative. "Smart Medical Information Technology for Healthcare (SMITH)" is one of four consortia funded by the German Medical Informatics Initiative (MI-I) to create an alliance of universities, university hospitals, research institutions and IT companies. SMITH's goals are to establish Data Integration Centers (DICs) at each SMITH partner hospital and to implement use cases which demonstrate the usefulness of the approach. OBJECTIVES: To give insight into architectural design issues underlying SMITH data integration and to introduce the use cases to be implemented. GOVERNANCE AND POLICIES: SMITH implements a federated approach as well for its governance structure as for its information system architecture. SMITH has designed a generic concept for its data integration centers. They share identical services and functionalities to take best advantage of the interoperability architectures and of the data use and access process planned. The DICs provide access to the local hospitals' Electronic Medical Records (EMR). This is based on data trustee and privacy management services. DIC staff will curate and amend EMR data in the Health Data Storage. METHODOLOGY AND ARCHITECTURAL FRAMEWORK: To share medical and research data, SMITH's information system is based on communication and storage standards. We use the Reference Model of the Open Archival Information System and will consistently implement profiles of Integrating the Health Care Enterprise (IHE) and Health Level Seven (HL7) standards. Standard terminologies will be applied. The SMITH Market Place will be used for devising agreements on data access and distribution. 3LGM2 for enterprise architecture modeling supports a consistent development process.The DIC reference architecture determines the services, applications and the standardsbased communication links needed for efficiently supporting the ingesting, data nourishing, trustee, privacy management and data transfer tasks of the SMITH DICs. The reference architecture is adopted at the local sites. Data sharing services and the market place enable interoperability. USE CASES: The methodological use case "Phenotype Pipeline" (PheP) constructs algorithms for annotations and analyses of patient-related phenotypes according to classification rules or statistical models based on structured data. Unstructured textual data will be subject to natural language processing to permit integration into the phenotyping algorithms. The clinical use case "Algorithmic Surveillance of ICU Patients" (ASIC) focusses on patients in Intensive Care Units (ICU) with the acute respiratory distress syndrome (ARDS). A model-based decision-support system will give advice for mechanical ventilation. The clinical use case HELP develops a "hospital-wide electronic medical record-based computerized decision support system to improve outcomes of patients with blood-stream infections" (HELP). ASIC and HELP use the PheP. The clinical benefit of the use cases ASIC and HELP will be demonstrated in a change of care clinical trial based on a step wedge design. DISCUSSION: SMITH's strength is the modular, reusable IT architecture based on interoperability standards, the integration of the hospitals' information management departments and the public-private partnership. The project aims at sustainability beyond the first 4-year funding period.


Asunto(s)
Atención a la Salud , Tecnología de la Información , Algoritmos , Gestión Clínica , Comunicación , Sistemas de Apoyo a Decisiones Clínicas , Registros Electrónicos de Salud , Almacenamiento y Recuperación de la Información , Unidades de Cuidados Intensivos , Modelos Teóricos , Fenotipo , Políticas
9.
Stud Health Technol Inform ; 223: 93-9, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27139390

RESUMEN

The vast amount of clinical data in electronic health records constitutes a great potential for secondary use. However, most of this content consists of unstructured or semi-structured texts, which is difficult to process. Several challenges are still pending: medical language idiosyncrasies in different natural languages, and the large variety of medical terminology systems. In this paper we present SEMCARE, a European initiative designed to minimize these problems by providing a multi-lingual platform (English, German, and Dutch) that allows users to express complex queries and obtain relevant search results from clinical texts. SEMCARE is based on a selection of adapted biomedical terminologies, together with Apache UIMA and Apache Solr as open source state-of-the-art natural language pipeline and indexing technologies. SEMCARE has been deployed and is currently being tested at three medical institutions in the UK, Austria, and the Netherlands, showing promising results in a cardiology use case.


Asunto(s)
Minería de Datos/métodos , Registros Electrónicos de Salud , Humanos , Almacenamiento y Recuperación de la Información/métodos , Lenguaje , Lingüística/métodos , Procesamiento de Lenguaje Natural , Semántica
10.
Stud Health Technol Inform ; 212: 9-14, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26063251

RESUMEN

Patients with chronic diseases undergo numerous in- and outpatient treatment periods, and therefore many documents accumulate in their electronic records. We report on an on-going project focussing on the semantic enrichment of medical texts, in order to support recall-oriented navigation across a patient's complete documentation. A document pool of 1,696 de-identified discharge summaries was used for prototyping. A natural language processing toolset for document annotation (based on the text-mining framework UIMA) and indexing (Solr) was used to support a browser-based platform for document import, search and navigation. The integrated search engine combines free text and concept-based querying, supported by dynamically generated facets (diagnoses, procedures, medications, lab values, and body parts). The prototype demonstrates the feasibility of semantic document enrichment within document collections of a single patient. Originally conceived as an add-on for the clinical workplace, this technology could also be adapted to support personalised health record platforms, as well as cross-patient search for cohort building and other secondary use scenarios.


Asunto(s)
Minería de Datos/métodos , Registros Electrónicos de Salud/organización & administración , Procesamiento de Lenguaje Natural , Semántica , Interfaz Usuario-Computador , Vocabulario Controlado , Aprendizaje Automático , Programas Informáticos , Navegador Web
11.
Stud Health Technol Inform ; 192: 581-4, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23920622

RESUMEN

OBJECTIVE: In the context of past and current SNOMED CT translation projects we compare three kinds of SNOMED CT translations from English to German by: (t1) professional medical translators; (t2) a free Web-based machine translation service; (t3) medical students. METHODS: 500 SNOMED CT fully specified names from the (English) International release were randomly selected. Based on this, German translations t1, t2, and t3 were generated. A German and an Austrian physician rated the translations for linguistic correctness and content fidelity. RESULTS: Kappa for inter-rater reliability was 0.4 for linguistic correctness and 0.23 for content fidelity. Average ratings of linguistic correctness did not differ significantly between human translation scenarios. Content fidelity was rated slightly better for student translators compared to professional translators. Comparing machine to human translation, the linguistic correctness differed about 0.5 scale units in favour of the human translation and about 0.25 regarding content fidelity, equally in favour of the human translation. CONCLUSION: The results demonstrate that low-cost translation solutions of medical terms may produce surprisingly good results. Although we would not recommend low-cost translation for producing standardized preferred terms, this approach can be useful for creating additional language-specific entry terms. This may serve several important use cases. We also recommend testing this method to bootstrap a crowdsourcing process, by which term translations are gathered, improved, maintained, and rated by the user community.


Asunto(s)
Inteligencia Artificial , Internet , Procesamiento de Lenguaje Natural , Programas Informáticos , Systematized Nomenclature of Medicine , Terminología como Asunto , Traducción , Alemania , Humanos , Validación de Programas de Computación , Estados Unidos
12.
Eur Radiol ; 22(12): 2750-8, 2012 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-22865274

RESUMEN

OBJECTIVES: To create an advanced image retrieval and data-mining system based on in-house radiology reports. METHODS: Radiology reports are semantically analysed using natural language processing (NLP) techniques and stored in a state-of-the-art search engine. Images referenced by sequence and image number in the reports are retrieved from the picture archiving and communication system (PACS) and stored for later viewing. A web-based front end is used as an interface to query for images and show the results with the retrieved images and report text. Using a comprehensive radiological lexicon for the underlying terminology, the search algorithm also finds results for synonyms, abbreviations and related topics. RESULTS: The test set was 108 manually annotated reports analysed by different system configurations. Best results were achieved using full syntactic and semantic analysis with a precision of 0.929 and recall of 0.952. Operating successfully since October 2010, 258,824 reports have been indexed and a total of 405,146 preview images are stored in the database. CONCLUSIONS: Data-mining and NLP techniques provide quick access to a vast repository of images and radiology reports with both high precision and recall values. Consequently, the system has become a valuable tool in daily clinical routine, education and research. KEY POINTS: Radiology reports can now be analysed using sophisticated natural language-processing techniques. Semantic text analysis is backed by terminology of a radiological lexicon. The search engine includes results for synonyms, abbreviations and compositions. Key images are automatically extracted from radiology reports and fetched from PACS. Such systems help to find diagnoses, improve report quality and save time.


Asunto(s)
Almacenamiento y Recuperación de la Información/métodos , Sistemas de Información Radiológica , Algoritmos , Minería de Datos , Humanos , Procesamiento de Lenguaje Natural , Motor de Búsqueda , Semántica , Programas Informáticos , Interfaz Usuario-Computador
13.
Stud Health Technol Inform ; 169: 594-8, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21893818

RESUMEN

Incomplete coding is a known problem in hospital information systems. In order to detect non-coded secondary diseases we developed a text classification system which scans discharge summaries for drug names. Using a drug knowledge base in which drug names are linked to sets of ICD-10 codes, the system selects those documents in which a drug name occurs that is not justified by any ICD-10 code within the corresponding record in the patient database. Treatment episodes with missing codes for diabetes mellitus, Parkinson's disease, and asthma/COPD were subject to investigation in a large German university hospital. The precision of the method was 79%, 14%, and 45% respectively, roughly estimated recall values amounted to 43%, 70%, and 36%. Based on these data we predict roughly 716 non-coded diabetes cases, 13 non-coded Parkinson cases, and 420 non-coded asthma/COPD cases among 34,865 treatment episodes.


Asunto(s)
Codificación Clínica/métodos , Minería de Datos/métodos , Bases de Datos Factuales , Grupos Diagnósticos Relacionados , Alta del Paciente , Algoritmos , Asma/clasificación , Diabetes Mellitus/clasificación , Registros Electrónicos de Salud , Hospitales , Humanos , Sistemas de Información/organización & administración , Enfermedad de Parkinson/clasificación , Enfermedad Pulmonar Obstructiva Crónica/clasificación , Terminología como Asunto
14.
AMIA Annu Symp Proc ; : 647-51, 2008 Nov 06.
Artículo en Inglés | MEDLINE | ID: mdl-18999064

RESUMEN

MorphoSaurus, a concept-based document search engine,was incorporated into an EHR system in order to support search across the whole corpus of patient discharge letters and other clinically relevant documents. A user survey showed a general satisfaction with the system and revealed novel usages for information stored in discharge letters.The retrieval system was also used to identify relevant documents for a five-year retrospective survey of suspicious syphilis cases in the department. This retrieval scenario was used to assess the performance of MorphoSaurus against a manually created gold standard. A substring search for the German words "syphilis" and"lues" was used as baseline. The system yielded a precision p = 20.1% and a recall r = 100%. The values for the substring "syphilis" were p = 65.5% and r = 47.5%, for"lues" p = 15.4% and r = 87.7%. The results support the use of the proposed recall-oriented search across EHR documents to acquire valid and complete data for epidemiology studies in hospital populations.


Asunto(s)
Actitud del Personal de Salud , Comportamiento del Consumidor , Sistemas de Administración de Bases de Datos/estadística & datos numéricos , Documentación/métodos , Almacenamiento y Recuperación de la Información/métodos , Sistemas de Registros Médicos Computarizados/estadística & datos numéricos , Reconocimiento de Normas Patrones Automatizadas/métodos , Interfaz Usuario-Computador , Algoritmos , Inteligencia Artificial , Alemania , Procesamiento de Lenguaje Natural , Encuestas y Cuestionarios
15.
Stud Health Technol Inform ; 129(Pt 1): 340-4, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-17911735

RESUMEN

In this paper we want to describe how the promising technology of biomedical data mining can improve the use of hospital information systems: a large set of unstructured, narrative clinical data from a dermatological university hospital like discharge letters or other dermatological reports were processed through a morpho-semantic text retrieval engine ("MorphoSaurus") and integrated with other clinical data using a web-based interface and brought into daily clinical routine. The user evaluation showed a very high user acceptance - this system seems to meet the clinicians' requirements for a vertical data mining in the electronic patient records. What emerges is the need for integration of biomedical data mining into hospital information systems for clinical, scientific, educational and economic reasons.


Asunto(s)
Dermatología , Sistemas de Información en Hospital , Almacenamiento y Recuperación de la Información , Indización y Redacción de Resúmenes , Actitud del Personal de Salud , Comportamiento del Consumidor , Humanos , Sistemas de Registros Médicos Computarizados , Procesamiento de Lenguaje Natural , Interfaz Usuario-Computador
16.
Stud Health Technol Inform ; 129(Pt 1): 392-6, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-17911746

RESUMEN

We propose an approach to multilingual medical document retrieval in which complex word forms are segmented according to medically relevant morpho-semantic criteria. At its core lies a multilingual dictionary, in which entries are equivalence classes of subwords, i.e. semantically minimal units. Using two different standard test collections for the medical domain, we evaluate our approach for six languages covered by our system.


Asunto(s)
Almacenamiento y Recuperación de la Información/métodos , Multilingüismo , Procesamiento de Lenguaje Natural , Semántica , Traducción
17.
Med Inform Internet Med ; 32(2): 131-47, 2007 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-17541863

RESUMEN

This work presents a new dictionary-based approach to biomedical cross-language information retrieval (CLIR) that addresses many of the general and domain-specific challenges in current CLIR research. Our method is based on a multilingual lexicon that was generated partly manually and partly automatically, and currently covers six European languages. It contains morphologically meaningful word fragments, termed subwords. Using subwords instead of entire words significantly reduces the number of lexical entries necessary to sufficiently cover a specific language and domain. Mediation between queries and documents is based on these subwords as well as on lists of word-n-grams that are generated from large monolingual corpora and constitute possible translation units. The translations are then sent to a standard Internet search engine. This process makes our approach an effective tool for searching the biomedical content of the World Wide Web in different languages. We evaluate this approach using the OHSUMED corpus, a large medical document collection, within a cross-language retrieval setting.


Asunto(s)
Almacenamiento y Recuperación de la Información/métodos , Sistemas de Información/organización & administración , Multilingüismo , Algoritmos , Humanos , Semántica , Traducción
18.
Stud Health Technol Inform ; 124: 857-62, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-17108620

RESUMEN

We propose a method that aligns biomedical acronyms and their definitions across different languages. The approach is based upon a freely available tool for the extraction of abbreviations together with their expansions, and the subsequent normalization of language-specific variants, synonyms, and translations of the extracted acronym definitions. In this step, acronym expansions are mapped onto a language-independent concept-layer on which intra- as well as interlingual comparisons are drawn.


Asunto(s)
Informática Médica , Multilingüismo , Semántica , Alemania , Lenguajes de Programación
19.
AMIA Annu Symp Proc ; : 669-73, 2005.
Artículo en Inglés | MEDLINE | ID: mdl-16779124

RESUMEN

The pivotal role of the relation part-of in the description of living organisms is widely acknowledged. Organisms are open systems, which means that in contradistinction to mechanical artifacts they are characterized by a continuous flow and exchange of matter. A closer analysis of the spatial relations in biological organism reveals that the decision as to whether a given particular is part-of a second particular or whether it is only contained-in the second particular may often be controversial. We here propose a rule-based approach which allows us to decide on the basis of well-defined criteria which of the two relations holds between two anatomical objects, given that one spatially includes the other. We discuss the advantages and limitations of this approach, using concrete examples from human anatomy.


Asunto(s)
Clasificación/métodos , Vocabulario Controlado , Algoritmos , Humanos
20.
AMIA Annu Symp Proc ; : 933, 2005.
Artículo en Inglés | MEDLINE | ID: mdl-16779220

RESUMEN

We present a unique technique to create a multilingual biomedical dictionary, based on a methodology called Morpho-Semantic indexing. Our approach closes a gap caused by the absence of free available multilingual medical dictionaries and the lack of accuracy of non-medical electronic translation tools. We first explain the underlying technology followed by a description of the dictionary interface, which makes use of a multilingual subword thesaurus and of statistical information from a domain-specific, multilingual corpus.


Asunto(s)
Diccionarios Médicos como Asunto , Multilingüismo , Indización y Redacción de Resúmenes
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...